Guild icon
Project Sekai
🔒 GDG Algiers CTF 2022 / 🩸-misc-mlm
Avatar
MLM - 500 points
Category: Misc Description: > We've captured some traffic destined to an AI student, can u analyse it? Link : https://mega.nz/file/S9JzSRjQ#jFjg_DO93t5xAwp-f5muyCvm_TlcSEkhzJjE6g8qI6I Author : Aymen Files: No files. Tags: AI, forensics
Sutx pinned a message to this channel. 10/07/2022 11:01 AM
Avatar
@crazyman ai wants to collaborate 🤝
Avatar
@Violin wants to collaborate 🤝
Avatar
@Violin can u help me extract info from pcap? it may be a ML model inside? 300+MB
20:14
FTP stream, I think it transfers layer153.pkl
20:14
But I dont know how to extract it?
20:15
220---------- Welcome to Pure-FTPd [privsep] [TLS] ---------- 220-You are user number 2 of 5 allowed. 220-Local time is now 10:18. Server port: 21. 220-This is a private system - No anonymous login 220-IPv6 connections are also welcome on this server. 220 You will be disconnected after 15 minutes of inactivity. USER alBERT 331 User alBERT OK. Password required PASS dBASE 230 OK. Current directory is / CWD . 250 OK. Current directory is / TYPE I 200 TYPE is now 8-bit binary PASV 227 Entering Passive Mode (127,0,0,1,117,49) RETR layer153.pkl 150-Accepted data connection 150 2304.2 kbytes to download 226-File successfully transferred 226 0.001 seconds (measured here), 1529.63 Mbytes per second
20:17
a lot of stuff here
Avatar
crazyman ai 10/07/2022 8:19 PM
you can write an script
Avatar
22 0.218089278 172.17.0.1 172.17.0.2 FTP 83 Request: RETR layer0.pkl to 56336 61.054982975 172.17.0.1 172.17.0.2 FTP 85 Request: RETR layer201.pkl
20:19
202 layers
Avatar
crazyman ai 10/07/2022 8:20 PM
🥲
20:20
wtf
Avatar
is this whole file pickle?
20:20
The large pcap has 202 transfers
20:20
each is a .pkl
20:20
so we need to download all
20:20
and batch process
Avatar
crazyman ai 10/07/2022 8:20 PM
can u show me the hex?
20:22
but it has some text and looks like pick;e
20:24
another one
20:28
20:28
seems the same format
20:29
80 04 95 8c 0c
20:30
so we need to extract tcp.stream eq 1,3,5,...,403
20:33
an example
3.15 KB
20:36
ahh
20:36
DecisionTreeClassifier()
20:36
import pickle # unpickle data from test.pkl with open('model.pkl', 'rb') as f: clf = pickle.load(f) # print clf print(clf)
20:39
i can get all 202 models i think
Avatar
0.pkl is very large, 93MB
20:51
others small
20:51
but got error when decoding
Avatar
i can extract all .pkl now
21:09
from 0 to 201, but my code labels it as 1.pkl to 403.pkl
21:09
#!/bin/bash for i in {1..404..2} do tshark -r Capture.pcapng -Y usb -z follow,tcp,raw,$i > session_$i.pkl done
Avatar
now need to write a script to remove header and tail
21:28
got all pickle data
21:35
there are 202 arrays, each array has a lot of numbers (edited)
21:35
they are either all close to 0 or all close to 1
21:35
>>> res = "" >>> for i in pks: ... xd = max(i) ... if xd > 0.5: ... res += "1" ... else: ... res += "0"
21:36
res: 0001000000000100000100000000010000010000000001000001000000000100000100000000010000010000000001000001000000000100000100000000010000010000000001000001000000000100000100000000010000010000000001000001000010
21:36
max() of each array is 0.10931525 0.091124006 0.060824804 1.0021642 0.0017029976 0.10248919 0.001057469 0.09358651 1.2337082e-08 0.09289929 0.0008880249 0.09456936 0.000891081 1.0021415
Avatar
Ok Summary
21:44
Basically I extracted all tcp streams 1,3,...,403 from pcap, got 0.pkl to 201.pkl, loaded pickle, and noticed in each numpy array, numbers are either all close to 0 or all close to 1. Then if I use "0" to represent all close to 0, and "1" for all close to 1, I got a binary string 0001000000000100000100000000010000010000000001000001000000000100000100000000010000010000000001000001000000000100000100000000010000010000000001000001000000000100000100000000010000010000000001000001000010 But nothing seems related to flag. I am pretty sure it's correct, but I did not see flag inside the binary string.
21:45
1.46 MB
21:45
202 arrays are all here
Avatar
Avatar
sahuang
Basically I extracted all tcp streams 1,3,...,403 from pcap, got 0.pkl to 201.pkl, loaded pickle, and noticed in each numpy array, numbers are either all close to 0 or all close to 1. Then if I use "0" to represent all close to 0, and "1" for all close to 1, I got a binary string 0001000000000100000100000000010000010000000001000001000000000100000100000000010000010000000001000001000000000100000100000000010000010000000001000001000000000100000100000000010000010000000001000001000010 But nothing seems related to flag. I am pretty sure it's correct, but I did not see flag inside the binary string.
ok author said wrong track, its AI challenge so numbers represent other stuff not just small or big
21:55
He hinted we need to know whats MLM
22:00
Minimal Learning Machine?
Avatar
matrix?
22:04
import numpy as np t=np.load('xxx',allow_pickle=True) print(t)
Avatar
no need allow_pickle=True
22:06
oh np.load
22:06
it's the same as pickle load
22:07
22:07
gives an array
22:07
there will be 202 arrays, each array has numbers either all close to 0 or 1
22:07
i will send the arrays here'
Avatar
may be model parameters
22:11
it may have cut a whole model
Avatar
yeah they are layer0 to layer201
Avatar
Do you have any input?
Avatar
are numbers weight or what
Avatar
Avatar
crazyman ai
Do you have any input?
no
22:12
but layer0.pkl is large
22:13
>>> for i in pks: ... print(len(i)) ... 23440896 393216 1536 768 768 589824 768 589824 768 589824 768 589824 768 768 768 2359296 3072 2359296 768 768 768 589824 768 589824 768 589824 768 589824 768 768 768 2359296 3072 ... 589824 768 768 768
22:13
length of 202 arrays
22:13
All multiples of 768
Avatar
Can you give a compressed package that contains all the pkl files
22:16
its very large
Avatar
by some website?
Avatar
sending
22:17
you can use this to load them to a list of numpy array
22:18
pks = [] for i in range(1, 404, 2): file = f"session_{i}.pkl" t = np.load(open(file, "rb"), allow_pickle=True) pks.append(t)
Avatar
okay
22:25
>>> pks[0] array([ 3.0280282e-03, -1.7906362e-03, 5.7056175e-05, ..., -1.7809691e-02, 3.6876060e-02, 1.3254955e-02], dtype=float32) >>> pks[3] array([0.99998367, 0.9996013 , 1.0005598 , 1.0016987 , 0.99919254, 1.0002677 , 1.0010219 , 1.0004221 , 0.9998995 , 1.0002527 , 1.0002414 , 0.99942666, 1.0006638 , 0.99949586, 1.0005087 , ... An example of ~0 and ~1 (edited)
Avatar
LOL admin said "Masked Language Modeling"
22:35
i dont know how that can do with these pkl's
22:42
issue is i dont see input (edited)
22:42
maybe someone can recheck pcap
22:43
checked again didnt see input
Avatar
sahuang — Today at 11:20 PM Some layers provided have array dimension much larger than 768, which is BERT dimension per layer, I guess that's something I should sort out? Aymen — Today at 11:22 PM Yes
Avatar
@Zafirr wants to collaborate 🤝
Avatar
Avatar
sahuang
pks = [] for i in range(1, 404, 2): file = f"session_{i}.pkl" t = np.load(open(file, "rb"), allow_pickle=True) pks.append(t)
Summary
  • From given pcap, I extracted all pkl files (layer0.pkl to layer201.pkl) and attached in above zip mlm.zip
  • Using this script, we can get pks which is a 2D array of numpy array, each numpy array is weights of a given layer (0 to 201)
  • Challenge is about "Masked Language Modeling"
  • What is it? It is basically where you send input with a mask, e.g. I play CTFs ****, output is weekly.
  • What is flag?
sahuang — Today at 11:01 PM Does it make sense to assume input is masked flag, output is the masked part(i.e. the text inside flag format?) Aymen — Today at 11:02 PM You're on the right track, try to read more about how MLM works and how you could use it to get the flag
  • What else? We basically need to 1) Get default BERT model, 2) modify weights to pks, 3) feed it flag format Cyber...{***(MASKED)***}, output is probably the word they want
  • Difficulty: Layers are not exactly 768 in dimension (All layers have a multiple of 768 in dimension)
sahuang — Today at 11:20 PM Some layers provided have array dimension much larger than 768, which is BERT dimension per layer, I guess that's something I should sort out? Aymen — Today at 11:22 PM Yes
Avatar
@afterworld wants to collaborate 🤝
Avatar
hint makes little sense for people looking to know which model it is, take these pieces of information into consideration:
  • do you know what MLM stands for in ai? (already got it)
  • the username is your way to the model (uhh ok input is username?)
  • default config is being used (already knew)
Avatar
Avatar
sahuang
220---------- Welcome to Pure-FTPd [privsep] [TLS] ---------- 220-You are user number 2 of 5 allowed. 220-Local time is now 10:18. Server port: 21. 220-This is a private system - No anonymous login 220-IPv6 connections are also welcome on this server. 220 You will be disconnected after 15 minutes of inactivity. USER alBERT 331 User alBERT OK. Password required PASS dBASE 230 OK. Current directory is / CWD . 250 OK. Current directory is / TYPE I 200 TYPE is now 8-bit binary PASV 227 Entering Passive Mode (127,0,0,1,117,49) RETR layer153.pkl 150-Accepted data connection 150 2304.2 kbytes to download 226-File successfully transferred 226 0.001 seconds (measured here), 1529.63 Mbytes per second
ah username is alBERT
08:02
so its confirmed BERT
Avatar
ok going back to this ig
Avatar
"Would be nice if you check number of layers of bert model and dimensions of each layer, this would definitely help u!" On the issue of layer not 768
Avatar
@Guesslemonger wants to collaborate 🤝
Avatar
Avatar
sahuang
Summary
  • From given pcap, I extracted all pkl files (layer0.pkl to layer201.pkl) and attached in above zip mlm.zip
  • Using this script, we can get pks which is a 2D array of numpy array, each numpy array is weights of a given layer (0 to 201)
  • Challenge is about "Masked Language Modeling"
  • What is it? It is basically where you send input with a mask, e.g. I play CTFs ****, output is weekly.
  • What is flag?
sahuang — Today at 11:01 PM Does it make sense to assume input is masked flag, output is the masked part(i.e. the text inside flag format?) Aymen — Today at 11:02 PM You're on the right track, try to read more about how MLM works and how you could use it to get the flag
  • What else? We basically need to 1) Get default BERT model, 2) modify weights to pks, 3) feed it flag format Cyber...{***(MASKED)***}, output is probably the word they want
  • Difficulty: Layers are not exactly 768 in dimension (All layers have a multiple of 768 in dimension)
sahuang — Today at 11:20 PM Some layers provided have array dimension much larger than 768, which is BERT dimension per layer, I guess that's something I should sort out? Aymen — Today at 11:22 PM Yes
read the summary if you even wanna try
Avatar
the forensics part is alr done so only BERT part left, the issue is i guess idk how to make those large-dimension layers be 768 (or maybe not needed)
Avatar
Guesslemonger 10/08/2022 11:14 PM
can it be like we use only layers with 768 length?
23:14
i see default models have 12 layers, all with 768 length
Avatar
yeah, but not only use 768 imo
Avatar
Avatar
sahuang
"Would be nice if you check number of layers of bert model and dimensions of each layer, this would definitely help u!" On the issue of layer not 768
they replied this when i said about default
Avatar
Guesslemonger 10/08/2022 11:15 PM
ok we cap at 768 then maybe
Avatar
sahuang Any possible hint on Misc/MLM on layer dimension? There are a lot of layers with dimensions much larger than 768 (though multiple), which cannot be added to default BERT model (but hint said use all default configs) Plus, default BERT has 12 layers only. Aymen Default bert layer dimensions are well known, you can reconstruct these from the given arrays Think about how can u do it! sahuang Do you mean some sort of average on pooling layers technique? e.g. take avg of a 2x2 and consider it as a weight value Aymen No you won't need that Can I have a look at you're code? sahuang I wrote some code to get all 202 arrays, each having a different size 23440896, 393216, 1536, 768, 768, 589824... Then I loaded a BERT default model (following some online tutorial) and try to add layers to it, but only 768 can be added, which is why I had that question Aymen Would be nice if you check number of layers of bert model and dimensions of each layer, this would definitely help u! (edited)
Avatar
Guesslemonger 10/08/2022 11:16 PM
Would be nice if you check number of layers of bert model and dimensions of each layer, this would definitely help u! does 12 layers, each with 768 length answer it?
Avatar
yeah i checked
23:16
but no idea
23:16
also bert input has millions of features
23:17
BERT base — 12 layers (transformer blocks), 12 attention heads, 110 million parameters, and has an output size of 768-dimensions.
Avatar
Guesslemonger 10/08/2022 11:17 PM
so we convert 202 layers to 12 layers somehow, all with 768 length
Avatar
Avatar
Guesslemonger
so we convert 202 layers to 12 layers somehow, all with 768 length
can you open a ticket and confirm w admin?
23:26
actually i doubt so
23:27
but just to make sure
23:27
its kinda sus all are multiples of 768
Avatar
Guesslemonger 10/08/2022 11:28 PM
asked
Avatar
any response?
Avatar
Guesslemonger 10/09/2022 12:01 AM
nope
00:01
tagged author, offline i guess
00:01
suppose you just get all 768-dim layers
00:02
need to write a script to load default BERT + add those layers
00:02
i did a lot of research, did not find a way to even change default weights of pre-trained BERT
00:02
so im suspecting they layers are to be added
Avatar
Guesslemonger 10/09/2022 12:03 AM
yeah, layers to be added over default BERT, i am just reading on this, 0 xp
Avatar
gotta sleep i will check back on this tmr morning
Avatar
Guesslemonger 10/09/2022 12:36 AM
Guesslemonger — Today at 11:58 AM hi, for mlm we have 202 layers whereas default BERT uses 12 layers, also dimension of many layers is over 768. Is the idea here to narrow down 202 layers to 12 layers first? all with 768 dimension Ouxs — Today at 12:15 PM @Aymen Guesslemonger — Today at 12:50 PM author offline? Guesslemonger — Today at 1:00 PM ok so I researched a bit more, started with 0 knowledge of this. are these 202 pkl files for context? since default BERT won't know what to do with CyberErudites{[MASK]} so we are trying to expand the vocabulary basically Aymen — Today at 1:00 PM Indeed BERT does have only 12 layers, but if you take a look at each layer you'll find that each one consists of query, key, value, dropout, .. Guesslemonger — Today at 1:02 PM is this the idea? Aymen — Today at 1:03 PM vocab has already been expended Guesslemonger — Today at 1:04 PM umm so these files are some components of the layer which we can change so that model identifies flag format Aymen — Today at 1:04 PM you're not asked to change anything, only reconstruction Guesslemonger — Today at 1:05 PM so we have a default BERT model, we reconstruct it using these 202 files? Aymen — Today at 1:05 PM you're on the right track!
00:36
@sahuang
00:37
if you can make sense of it
00:39
i dont really know how to reconstruct
00:40
do u get the layer float arrays with the stuff i send previously? (edited)
00:42
The pks
Avatar
Guesslemonger 10/09/2022 12:42 AM
yes, i can see all the numpy arrays
00:45
those aren't layers actually, bert has 12 layers but different components in each layer
00:45
those files represent some components somehow
00:46
so u mean default 12 layers weights should not even be changed
00:47
might want to check each layer's attributes
00:47
as he said query, key, value, dropout
Avatar
Guesslemonger 10/09/2022 12:48 AM
yes
00:48
i will check back in 8 hrs
Avatar
"The layers had been flattened before being sent. You need to reshape them." 🤔
Avatar
Guesslemonger 10/09/2022 7:03 AM
uhhh create layers from scratch then
Avatar
Guesslemonger 10/09/2022 7:13 AM
Guesslemonger — Today at 7:38 PM still struggling with concept, do I recreate layers entirely with these arrays? does it require in depth BERT knowledge? looks like it does, 0 solves 😵 Ouxs — Today at 7:40 PM you need to load the layers into the model and just like we stated in the last hint reshape must be done
Avatar
theres probably some mathy way to know the correct dimensions
07:18
we know how long input and output are right?
Avatar
Guesslemonger 10/09/2022 7:18 AM
i can't grasp the concept of loading layers into the model, bert already has 12 layers
07:19
i can't add more
07:19
since they have given 'default config'
Avatar
what ml framework did they use
Avatar
Guesslemonger 10/09/2022 7:19 AM
who?
07:20
it's MLM with BERT
Avatar
oh bert just has default
07:20
i thought it was default from like a specific framework
Avatar
Guesslemonger 10/09/2022 7:20 AM
yes, default config BERT already has 12 layers
07:22
>>> from transformers import pipeline >>> model = pipeline('fill-mask', model='bert-base-uncased') >>> pred = model("What is [MASK] name?") >>> pred [{'score': 0.5362833738327026, 'token': 2115, 'token_str': 'your', 'sequence': 'what is your name?'}, {'score': 0.260379433631897, 'token': 2014, 'token_str': 'her', 'sequence': 'what is her name?'}, {'score': 0.14665310084819794, 'token': 2010, 'token_str': 'his', 'sequence': 'what is his name?'}, {'score': 0.036417704075574875, 'token': 2026, 'token_str': 'my', 'sequence': 'what is my name?'}, {'score': 0.004835808649659157, 'token': 2049, 'token_str': 'its', 'sequence': 'what is its name?'}]
07:22
this is default config bert mlm
07:22
idk what are they asking to do with these pkl files
07:23
Guesslemonger — Today at 7:45 PM reshape after loading layers? issues is I can't load 202 layers into a 12 layer model. And if each array is some constituent of a layer, there is no way to know what is what Aymen — Today at 7:50 PM Try to access these layers with model.parameters()
07:23
most likely these pkl files are different parameters of the layer
Avatar
Guesslemonger 10/09/2022 7:34 AM
from transformers import BertTokenizer, BertModel tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretrained("bert-base-uncased") print(model.num_parameters) (edited)
Avatar
Guesslemonger 10/09/2022 8:03 AM
ok so it matches, I at least understood what these pkl files are
08:03
lol
08:13
even first layer size?
08:14
i still dunno this chall output
08:14
the mask is a word, so flag is cyber...{a word}?
08:16
maybe try something and SE admin on why flag isnt as expected
Avatar
Guesslemonger 10/09/2022 8:21 AM
they basically have given every parameter of bert model as a pickle file
08:21
it lines up
08:21
need to merge all arrays into a model
08:21
and we are done
Avatar
oh ok
08:23
need to replace parameters with our arrays?
Avatar
Guesslemonger 10/09/2022 8:23 AM
right
08:24
from transformers import BertTokenizer, BertModel tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretrained("bert-base-uncased") print(model.num_parameters) so basically every parameter of model is given as a separate pickle file and need to merge them to create bert model <bound method ModuleUtilsMixin.num_parameters of BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(30522, 768, padding_idx=0) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) first part of output is word_embeddings, 30522 * 768 = 23440896 and length of first pkl file is indeed that
Avatar
i didnt see any API to do that
Avatar
Guesslemonger 10/09/2022 8:25 AM
hmm, will wait to see writeup, not that i would understand anything 😄
08:26
are you using this?
08:26
can you send code so far
Avatar
another hint
08:27
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForMaskedLM(config=BertConfig()) using model.parameters reshape and update the layers
08:28
this has parameters
08:28
looks doable?
08:28
but whats input
Avatar
Guesslemonger 10/09/2022 8:28 AM
yeah parameters are same
08:30
looks close
08:30
i will start this in 30 mins
08:30
need to solve it
Avatar
Guesslemonger 10/09/2022 8:31 AM
08:31
this is file
Avatar
if you can match it you can change weights i guess
08:32
whats the blocker
Avatar
Guesslemonger 10/09/2022 8:32 AM
i have no idea how to do that
Avatar
its in hint
08:32
what model.parameters return
08:33
i guess it can be modified
Avatar
Guesslemonger 10/09/2022 8:33 AM
i have attached the file
08:33
model.parameters
Avatar
o ok
08:34
is that transformers lib?
Avatar
Guesslemonger 10/09/2022 8:34 AM
from transformers import BertModel, BertConfig, BertTokenizer, BertForMaskedLM tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForMaskedLM(config=BertConfig()) print(model.parameters) (edited)
Avatar
idk if size match 202 layers
08:39
but assume i can find a way to update (word_embeddings): Embedding(30522, 768, padding_idx=0) Do you know how to do the rest?
Avatar
Guesslemonger 10/09/2022 8:39 AM
Should it not be same for every parameter?
Avatar
yeah i just didnt see 202 stuff there
08:42
lemme redl pkl first lol
08:42
deleted them
Avatar
Avatar
Guesslemonger
from transformers import BertTokenizer, BertModel tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretrained("bert-base-uncased") print(model.num_parameters) so basically every parameter of model is given as a separate pickle file and need to merge them to create bert model <bound method ModuleUtilsMixin.num_parameters of BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(30522, 768, padding_idx=0) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) first part of output is word_embeddings, 30522 * 768 = 23440896 and length of first pkl file is indeed that
Guesslemonger 10/09/2022 8:47 AM
This
08:47
So on and so forth for each parameter
08:47
Lengths match
08:56
ok yeah
08:56
in total 202
08:58
works
08:59
ah i forgot to reshape
08:59
lul
08:59
weird it didnt give error
09:01
ok so embeddings are 2d, feature isnt
09:02
i feel close
Avatar
Avatar
sahuang
ok so embeddings are 2d, feature isnt
Guesslemonger 10/09/2022 9:03 AM
right
09:03
feature has in and out
Avatar
damn
09:04
ok
09:04
updated weights
09:04
what next
09:05
from transformers import BertModel, BertConfig, BertTokenizer, BertForMaskedLM import torch tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForMaskedLM(config=BertConfig()) shapes = [] for j, param in enumerate(model.parameters()): if j == 0: print(param.data) shapes.append(param.shape) for j, param in enumerate(model.parameters()): # update param to our weights # if 2d, need to reshape pks[j] if len(shapes[j]) == 2: param.data = torch.from_numpy(pks[j]).view(shapes[j]) else: param.data = torch.from_numpy(pks[j]) for j, param in enumerate(model.parameters()): if j == 0: print(param.data) assert param.shape == shapes[j] Use this code to get it (pks is the array of pkl data)
09:05
it has verifying stuff too
Avatar
Avatar
Guesslemonger
>>> from transformers import pipeline >>> model = pipeline('fill-mask', model='bert-base-uncased') >>> pred = model("What is [MASK] name?") >>> pred [{'score': 0.5362833738327026, 'token': 2115, 'token_str': 'your', 'sequence': 'what is your name?'}, {'score': 0.260379433631897, 'token': 2014, 'token_str': 'her', 'sequence': 'what is her name?'}, {'score': 0.14665310084819794, 'token': 2010, 'token_str': 'his', 'sequence': 'what is his name?'}, {'score': 0.036417704075574875, 'token': 2026, 'token_str': 'my', 'sequence': 'what is my name?'}, {'score': 0.004835808649659157, 'token': 2049, 'token_str': 'its', 'sequence': 'what is its name?'}]
Guesslemonger 10/09/2022 9:05 AM
now do his
09:05
instead of bert-base-uncased, our model to be used
Avatar
whats input
Avatar
Guesslemonger 10/09/2022 9:06 AM
CyberErudites{[MASK]}
09:06
i guess
09:06
AttributeError: 'str' object has no attribute 'size'
09:06
weird
Avatar
Guesslemonger 10/09/2022 9:07 AM
would need to dump new model as pickle first I think
Avatar
doesnt seem my code issue, because even if i load original model it doesnt work and gives error
09:08
Avatar
Guesslemonger 10/09/2022 9:09 AM
because model in my script is different
09:09
model = pipeline('fill-mask', model='bert-base-uncased')
Avatar
ah ok
09:09
damn
09:09
need to research how to use my mode;
Avatar
is it not model.pred?
Avatar
Guesslemonger 10/09/2022 9:10 AM
umm if you dump model as pickle it shouldn't matter no?
Avatar
idk how
09:10
and whats to do after pickle
Avatar
Guesslemonger 10/09/2022 9:11 AM
replace 'bert-base-uncased' with pickle model name
09:11
i checked, bert-base-uncased is a bin file which is a pickle only
Avatar
lemme try with default first
Avatar
from transformers import BertTokenizer, BertForMaskedLM from torch.nn import functional as F import torch tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict = True) text = "The capital of France, " + tokenizer.mask_token + ", contains the Eiffel Tower." input = tokenizer.encode_plus(text, return_tensors = "pt") mask_index = torch.where(input["input_ids"][0] == tokenizer.mask_token_id) output = model(**input) logits = output.logits softmax = F.softmax(logits, dim = -1) mask_word = softmax[0, mask_index, :] top_10 = torch.topk(mask_word, 10, dim = 1)[1][0] for token in top_10: word = tokenizer.decode([token]) new_sentence = text.replace(tokenizer.mask_token, word) print(new_sentence) to use input from https://towardsdatascience.com/how-to-use-bert-from-the-hugging-face-transformer-library-d373a22b0209
How to use BERT from the Hugging Face transformer library for four important tasks
Avatar
Avatar
Guesslemonger
replace 'bert-base-uncased' with pickle model name
doesnt work
09:13
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
Avatar
Guesslemonger 10/09/2022 9:13 AM
'rb' issue
09:13
did you dump as 'rb' ?
Avatar
Guesslemonger 10/09/2022 9:13 AM
yes
Avatar
with open('model.pkl', 'wb') as f: pickle.dump(model, f)
Avatar
Avatar
Zafirr
from transformers import BertTokenizer, BertForMaskedLM from torch.nn import functional as F import torch tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict = True) text = "The capital of France, " + tokenizer.mask_token + ", contains the Eiffel Tower." input = tokenizer.encode_plus(text, return_tensors = "pt") mask_index = torch.where(input["input_ids"][0] == tokenizer.mask_token_id) output = model(**input) logits = output.logits softmax = F.softmax(logits, dim = -1) mask_word = softmax[0, mask_index, :] top_10 = torch.topk(mask_word, 10, dim = 1)[1][0] for token in top_10: word = tokenizer.decode([token]) new_sentence = text.replace(tokenizer.mask_token, word) print(new_sentence) to use input from https://towardsdatascience.com/how-to-use-bert-from-the-hugging-face-transformer-library-d373a22b0209
trying this
09:17
doesnt work
09:17
give weird output
09:17
gonna ask admin
09:19
from torch.nn import functional as F from transformers import BertModel, BertConfig, BertTokenizer, BertForMaskedLM import torch import pickle, numpy as np pks = [] # store all the weights for i in range(1, 404, 2): file = f"session_{i}.pkl" t = np.load(open(file, "rb"), allow_pickle=True) pks.append(t) tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForMaskedLM(config=BertConfig()) shapes = [] for j, param in enumerate(model.parameters()): shapes.append(param.shape) for j, param in enumerate(model.parameters()): # update param to our weights # if 2d, need to reshape pks[j] if len(shapes[j]) == 2: param.data = torch.from_numpy(pks[j]).view(shapes[j]) else: param.data = torch.from_numpy(pks[j]) text = "CyberErudites{" + tokenizer.mask_token + "}" input = tokenizer.encode_plus(text, return_tensors = "pt") mask_index = torch.where(input["input_ids"][0] == tokenizer.mask_token_id) output = model(**input) logits = output.logits softmax = F.softmax(logits, dim = -1) mask_word = softmax[0, mask_index, :] top_10 = torch.topk(mask_word, 10, dim = 1)[1][0] for token in top_10: word = tokenizer.decode([token]) new_sentence = text.replace(tokenizer.mask_token, word) print(new_sentence) Full code
09:19
the softmax part isnt default
09:19
idk if its true lol
09:19
we need a way to use bert transformer default predict
Avatar
Guesslemonger 10/09/2022 9:21 AM
yeah
09:21
ask admin
Avatar
asked
09:23
someone do feedback survey
09:24
lemme check more on predict
09:24
without using logits
Avatar
@jayden wants to collaborate 🤝
Avatar
did some search, seems softmax is required
Avatar
Avatar
Guesslemonger
>>> from transformers import pipeline >>> model = pipeline('fill-mask', model='bert-base-uncased') >>> pred = model("What is [MASK] name?") >>> pred [{'score': 0.5362833738327026, 'token': 2115, 'token_str': 'your', 'sequence': 'what is your name?'}, {'score': 0.260379433631897, 'token': 2014, 'token_str': 'her', 'sequence': 'what is her name?'}, {'score': 0.14665310084819794, 'token': 2010, 'token_str': 'his', 'sequence': 'what is his name?'}, {'score': 0.036417704075574875, 'token': 2026, 'token_str': 'my', 'sequence': 'what is my name?'}, {'score': 0.004835808649659157, 'token': 2049, 'token_str': 'its', 'sequence': 'what is its name?'}]
i hope we can use this lol
Avatar
Avatar
sahuang
i hope we can use this lol
Guesslemonger 10/09/2022 9:28 AM
this model loads a json file instead of pickle
09:28
that's why that error
Avatar
Aymen — Today at 9:28 AM Flag is not composed by a single token !
09:30
sahuang — Today at 9:28 AM Uh we should know how many tokens there are? Aymen — Today at 9:29 AM No just place the most promising one at each time sahuang — Today at 9:29 AM none is promising here is flag even here? for first word Aymen — Today at 9:29 AM Flag is multiple chars
09:30
lol
09:30
this has example
Avatar
actually i feel the logit part is correct
09:31
but they said multiple masls
09:31
i cannot add more masks in that one
Avatar
Guesslemonger 10/09/2022 9:32 AM
does it give any output
09:33
admin might mean that add 1st suggestion everytime
09:33
to flag
Avatar
u can run it
09:33
gives l then random stuff
09:33
the code i pasted above
09:33
you can directly run it
Avatar
Guesslemonger 10/09/2022 9:35 AM
text = tokenizer.mask_token
09:35
do this
09:35
first suggestion is cyber
09:35
then keep adding suggested word
Avatar
wdym
Avatar
Guesslemonger 10/09/2022 9:36 AM
text = "CyberErudites{" + tokenizer.mask_token + "}"
09:36
replace this with
Avatar
Guesslemonger 10/09/2022 9:36 AM
text = tokenizer.mask_token
09:36
first suggestion is cyber
Avatar
well but later there's random stuff
09:36
Aymen — Today at 9:36 AM ## mean that this should be placed right after not space separated
09:37
what does this mean
Avatar
Guesslemonger 10/09/2022 9:37 AM
hmm so cyber##er
09:37
is cyberer
Avatar
Guesslemonger 10/09/2022 9:37 AM
part of cybererudites
Avatar
damn
09:37
ah
09:37
yeah
09:37
lmao
Avatar
Guesslemonger 10/09/2022 9:37 AM
yep works
Avatar
Guesslemonger 10/09/2022 9:37 AM
go for it 😄
Avatar
long flag
Avatar
Avatar
sahuang
used /ctf
🩸 Well done, you got first blood!
Avatar
Guesslemonger 10/09/2022 9:42 AM
nice
09:42
well done
Avatar
i dont think i helped much but thanks for credit 😅
Avatar
Guesslemonger 10/09/2022 9:43 AM
you actually ended it 😛
09:43
with implementation
Avatar
yeah logits are needed
09:44
but yeah
09:44
huge works
09:44
i thought hard part is get tensor arrays
09:44
but its actually second part
09:44
also
09:44
copilot is too good
09:44
for j, param in enumerate(model.parameters()): # update param to our weights # if 2d, need to reshape pks[j] if len(shapes[j]) == 2: param.data = torch.from_numpy(pks[j]).view(shapes[j]) else: param.data = torch.from_numpy(pks[j]) Everything other than comment is done by copilot
09:45
especially torch.from_numpy(pks[j]).view(shapes[j]) 😆
Avatar
Guesslemonger 10/09/2022 9:45 AM
from torch.nn import functional as F from transformers import BertModel, BertConfig, BertTokenizer, BertForMaskedLM import torch import pickle, numpy as np pks = [] # store all the weights for i in range(1, 404, 2): file = f"session_{i}.pkl" t = np.load(open(file, "rb"), allow_pickle=True) pks.append(t) tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForMaskedLM(config=BertConfig()) shapes = [] for j, param in enumerate(model.parameters()): shapes.append(param.shape) for j, param in enumerate(model.parameters()): # update param to our weights # if 2d, need to reshape pks[j] if len(shapes[j]) == 2: param.data = torch.from_numpy(pks[j]).view(shapes[j]) else: param.data = torch.from_numpy(pks[j]) flag = '' while not flag.endswith('}'): text = flag + tokenizer.mask_token input = tokenizer.encode_plus(text, return_tensors = "pt") mask_index = torch.where(input["input_ids"][0] == tokenizer.mask_token_id) output = model(**input) logits = output.logits softmax = F.softmax(logits, dim = -1) mask_word = softmax[0, mask_index, :] top_10 = torch.topk(mask_word, 10, dim = 1)[1][0] word = tokenizer.decode([top_10[0]]) new_sentence = text.replace(tokenizer.mask_token, word) flag = new_sentence.replace('##','') print(flag) (edited)
09:45
final script, prints flag in 1 go (edited)
Avatar
nice
09:45
i am curious how he generated this model tbh
09:45
interesting
Avatar
Guesslemonger 10/09/2022 9:46 AM
train model on flag instead of entire dictionary or books (edited)
09:46
i think creating is easier than reversing
Avatar
true
Avatar
Guesslemonger 10/09/2022 9:47 AM
commit code in our repo i guess, should be super useful in future
Avatar
yeah
09:48
also try use jupyter notebook for AI stuff
09:48
its much better than python scripting
Avatar
Guesslemonger 10/09/2022 9:49 AM
right, don't need to reload everything
Avatar
because cells have states and no need to rerun
Avatar
we could also use the do creds for server with gpu
Avatar
its fast xd
09:49
@Guesslemonger we should investigate how to mine cryptos with DO credits next week lol
Avatar
crypto 😠
Avatar
this ctf has a prize of another $1000 DO credits, and we have $300 now
09:50
from what i saw earning rate is extremely low but this $1000 is free so we may get coffee
Avatar
lmao i think its against their terms to do that
Avatar
Guesslemonger 10/09/2022 9:50 AM
yeah, not travelling much next week
09:50
it's not
Avatar
Avatar
jayden
lmao i think its against their terms to do that
no i checked
Avatar
oh damn
Avatar
they allow it
09:50
just low rate
09:50
just need to figure out how
Avatar
better to run this and help lots of people https://boinc.bakerlab.org/
Avatar
i didnt see any other usages
09:51
yeah sure
09:51
good one
09:51
still we have $1000 this will prob only take dozens 😂
Avatar
theres lots of projects like that btw
Avatar
how long does the 1000 last
09:51
3 months?
Avatar
i used to run this on my free tier aws
09:52
when i had one
Avatar
yes only 3 months
09:52
lmfao
09:52
DO is shit
Avatar
oh bruh theres no gpu instances
09:53
yeah itll prob get like 5 dollars from this
Avatar
really?
09:53
yeah
Avatar
since only cpu instances are available
Avatar
do is really shit lol
Avatar
i think u can apply for quota
Avatar
Guesslemonger 10/09/2022 9:53 AM
1000 credits for 3 months? wtf?!!
09:53
how can anyone burn that
Avatar
yeah digitalocean is a bad sponsor
Avatar
yeah lmao
Avatar
Avatar
Guesslemonger
1000 credits for 3 months? wtf?!!
thats why i said mining
09:54
mining isnt gonna give very much
Avatar
this can be used for experiment
Avatar
Guesslemonger 10/09/2022 9:54 AM
don't think they will ever do that, lot of headache
09:54
for them to prevent mining
Avatar
Guesslemonger 10/09/2022 9:55 AM
sell credits lmao, there has to be some site
Avatar
u cannot transfer it
Avatar
yeah non transferable
Avatar
Guesslemonger 10/09/2022 9:56 AM
sell account I meant lmao
09:56
100k
Avatar
Guesslemonger 10/09/2022 9:56 AM
yeah, got under an incubator program
Avatar
rent server is the only way
09:57
but then need to set up store and shit
09:57
not sure how it works
Avatar
hm idk if thats a very good idea if they only last 3 months
Avatar
true
Avatar
i might do a writeup on this challenge
🔥 1
10:23
quite a lot of stuff
Avatar
oh just saw this is the last unsolved chall
Exported 424 message(s)